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Abstract 

We  prove  that  the  recently  proposed  informational  herding  models  are  but 
special  cases  of  a  standard  single  person  experimentation  model  with  myopia. 
We  then  re-interpret  the  incorrect  herding  outcome  as  a  familiar  failure  of 
complete  learning  in  an  optimal  experimentation  problem. 

We  next  explore  that  experimentation  model  with  patient  individuals  — 
or  equivalently,  the  observational  learning  model  where  individuals  care  about 
successors.  As  such,  we  find  that  even  when  individuals  internalize  the  herding 
externality  in  this  fashion,  incorrect  herds  and  incomplete  learning  still  obtain. 
We  note  that  this  outcome  can  be  implemented  as  a  constrained  social  optimum 
when  decentralized  by  transfers. 


'The  proposed  mapping  in  this  paper  first  appeared  in  July  1995  as  a  later  section  in  our  companion 
paper,  "Pathological  Outcomes  of  Observational  Learning."  For  the  current  full  length  paper,  we  thank 
Abhijit  Banerjee,  Christopher  Wallace,  and  the  MIT  theory  lunch  for  comments.  Smith  gratefully  ac- 
knowledges financial  support  for  this  work  from  the  NSF,  and  S0rensen  equally  thanks  the  Danish  Social 
Sciences  Research  Council. 
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And  out  of  old  bookes,  in  good  faithe, 
Cometh  al  this  new  science  that  men  lere. 

—  Geoffrey  Chaucer1 

1.  INTRODUCTION 

The  last  few  years  has  seen  a  flood  of  research  on  a  paradigm  known  as  informational 
herding.  We  ourselves  have  actively  participated  in  this  research  herd  (Smith  and  Sorensen 
(1996a),  or  simply  SS)  that  was  sparked  independently  by  Banerjee  (1992)  and  BHW: 
Bikhchandani,  Hirshleifer,  and  Welch  (1992).  The  context  is  seductively  simple:  An  infinite 
sequence  of  individuals  must  decide  on  an  action  choice  from  a  finite  menu.  Everyone  has 
identical  preferences  and  menus,  and  each  may  condition  his  decision  both  on  his  (endowed) 
private  signal  about  the  state  of  the  world  and  on  all  his  predecessors'  decisions;  however, 
observation  of  their  private  signals  is  verboten. 

If  these  private  signals  have  bounded  power,  it  is  known  that  a  'herd'  eventually  arises, 
but  is  not  always  correct  —  namely,  after  some  point,  all  make  the  identical  choice,  possibly 
unwise.  This  simple  pathological  outcome  has  understandably  attracted  much  fanfare. 

In  SS  we  explored,  embellished,  and  fully  fleshed  out  this  story,  and  found  that  herding 
is  quite  robust.  BHW's  kindred  notion  of  cascades  is  not  as  resilient:  While  beliefs  converge 
upon  a  limit  where  only  one  action  is  taken  with  probability  one  (a  limit  cascade),  this  need 
not  occur  in  finite  time.  The  potential  for  bad  herds  is  also  not  without  caveat:  Absent  a 
uniform  bound  on  the  strength  of  the  individuals'  private  signals,  only  correct  herds  arise. 
Finally,  in  a  world  with  multiple  preference  types,  a  confounded  learning  outcome  might 
occur,  where  the  lesson  of  history  is  forever  mixed,  and  private  signals  always  decisive. 

A  Possible  Link,  and  a  Puzzle.  Robustness  aside,  this  paper  pursues  a  different 
train  of  thought:  Is  informational  herding  fundamentally  a  new  phenomenon?  We  have 
been  piqued  by  its  similarity  to  the  familiar  failure  of  complete  learning  in  an  optimal 
experimentation  problem.  One  classic  example  is  Rothschild's  (1974)  analysis  of  the  two- 
armed  bandit:  An  infinite-lived  impatient  monopolist  optimally  experiments  with  two 
possible  prices  each  period,  with  purchase  chances  for  each  price  being  fixed,  unknown 
draws  from  [0, 1].  Rothschild  showed  that  the  monopolist  would  (i)  eventually  settle  down 
on  one  of  the  prices,  and  (ii)  with  positive  probability  select  the  less  profitable  price. 

Aiming  for  more  than  a  casual  analogy  between  this  outcome  and  herding,  we  must 
recast  the  observational  learning  paradigm  as  a  single  person  optimization  problem.  This 

1See  The  Assembly  of  Fowles.  Line  22. 


suggests  considering  the  forgetful  experimenter,  who  each  period  receives  a  new  informative 
signal,  takes  an  optimal  action,  and  then  promptly  forgets  his  signal;  the  next  period,  he 
can  reflect  only  on  his  action  choice.  Alas,  this  is  neither  a  very  satisfying  model  of 
rationality,  nor  can  it  be  the  basis  for  an  optimal  experimentation  problem.  How  then  can 
an  experimenter  not  observe  the  private  signals,  and  yet  take  informative  actions? 

The  Second  Puzzle,  and  a  Resolution.  An  interesting  sequel  to  Rothschild's 
work  was  McLennan  (1984),  who  permitted  the  monopolist  the  flexibility  to  charge  one 
of  a  continuum  of  prices,  but  assumed  only  two  possible  linear  purchase  chance  'demand 
curves'.  When  the  demand  curves  crossed,  he  found  that  the  monopolist  may  well  settle 
down  on  the  suboptimal  uninformative  price. 

Rothschild's  and  McLennan's  models  give  examples  of  potentially  confounding  actions, 
as  introduced  in  EK:  Easley  and  Kiefer  (1988).  In  brief,  these  are  actions  that  are  opti- 
mal for  unfocused  beliefs  for  which  they  are  invariants  (i.e.  taking  the  action  leaves  the 
beliefs  unchanged).  Of  particular  significance  is  the  proof  in  EK  (on  page  1059)  that  with 
finite  state  and  action  spaces,  there  will  generically  not  exist  any  potentially  confounding 
actions,  and  thus  complete  learning  must  arise.2  Rothschild  and  McLennan  might  be  seen 
as  separate  anticipations  of  EK's  general  insight.  Rothschild  escapes  it  by  means  of  a 
continuous  state  space,  whereas  McLennan  resorts  to  a  continuous  action  space.  Yet  there 
appears  no  escape  for  the  herding  paradigm,  where  both  flavors  of  incomplete  learning, 
limit  cascades  and  confounded  learning,  generically  arise  with  two  actions  and  two  states. 

Our  resolution  of  these  puzzles  respects  the  quintessence  of  the  herding  paradigm  that 
predecessors'  signals  are  hidden  from  view.  In  a  nutshell,  we  imagine  that  individuals  do 
not  act  for  themselves,  but  rather  furnish  optimal  history-contingent  'decision  rules'  for 
agent  machines  that  automatically  map  any  realized  private  signal  into  an  action  choice. 
With  this  device,  informational  herding  is  properly  understood  as  a  new  (camouflaged) 
context  for  an  old  phenomenon:  incomplete  learning  by  an  infinite-lived  experimenter. 

Herding  with  Forward-Looking  Behavior.  The  herding  outcome  is  a  striking 
example  of  an  informational  externality:  While  all  individuals  collectively  know  enough  to 
fully  determine  the  state  of  the  world,  it  is  aggregated  rather  poorly.  For  in  a  herd,  most 
individuals  almost  surely  take  an  action  which  reveals  almost  none  of  their  information. 
Late-comers  ideally  prefer  that  their  predecessors  had  better  signalled  their  information 
with  more  revealing  actions,  but  early  individuals  clearly  have  no  incentive  to  do  so. 

Notwithstanding  the  motivation  of  this  research,  some  may  therefore  find  the  second 

2For  instance,  payoff  assignments  in  a  one-armed  bandit  —  where  the  safe  arm  is  a  potentially  con- 
founding action  —  are  not  generic  in  K2 . 


half  of  this  paper  more  compelling,  where  we  investigate  the  herding  externality  with 
forward-looking  behavior.  In  fact,  having  recast  the  problem  as  one  of  optimal  individual 
experimentation,  it  is  not  implausible  that  learning  is  incomplete  simply  because  of  the 
myopia.  For  as  is  well  known,  a  rational  experimenter  is  willing  to  forgo  some  current 
payoff  in  order  to  secure  knowledge  relevant  for  future  payoffs  and  decisions.  We  prove  in 
fact  that  in  the  experimentation  model  without  myopia,  learning  is  still  not  complete.3 

Herding  itself  is  best  captured  by  an  equivalent  social  planner's  problem,  where  the  goal 
is  to  maximize  the  present  discounted  value  of  individuals'  welfares.  The  social  planner 
likewise  will  sacrifice  early  payoffs  for  the  informational  benefit  of  sucessors.  Such  an 
outcome  can  be  decentralized  as  a  constrained  social  optimum  by  means  of  a  simple  set 
of  budget  balance  transfers  that  encourage  experimentation.  Yet  we  find  that  the  herding 
externality  is  only  imperfectly  mitigated  by  the  social  planner:  Incorrect  herds  still  obtain, 
but  with  chances  vanishing  as  the  discount  factor  converges  to  one. 

Overview.  Section  2  outlines  a  general  observational  learning  model.  In  section  3, 
we  re-interpret  the  herding  paradigm  as  one  of  optimal  single-person  experimentation.  Sec- 
tion 4  characterizes  the  limiting  beliefs  of  that  experimenter,  and  then  reverts  to  the  social 
planner's  herding  problem  for  the  characterization  of  the  limiting  actions.  A  conclusion 
gives  a  broader  perspective  on  our  findings.  Longer  but  essential  proofs  are  appendicized. 

2.  OBSERVATIONAL  LEARNING  MODEL 

In  this  section  we  set  up  a  very  general  observational  learning  model,  that  subsumes 
SS,  and  thus  BHW  and  Banerjee  (1992),  and  also  happens  to  cover  Lee  (1993).  This 
generality  facilitates  the  ensuing  mapping  into  the  experimentation  literature. 

Information.  An  infinite  sequence  of  individuals  n  =  1,2,...  takes  actions  in  that 
exogenous  order.  There  is  uncertainty  about  the  payoffs  from  these  actions.  The  elements 
of  the  parameter  space  (0,  T)  are  referred  to  as  states  of  the  world.  There  is  a  given 
common  prior  belief,  the  probability  measure  A  over  0. 

Individual  n  receives  a  private  random  signal,  on  €  E,  about  the  state  of  the  world.  As 
demonstrated  in  SS  (Lemma  1),  we  may  assume  WLOG  that  the  private  signal  received 
by  an  individual  is  actually  his  private  belief,  i.e.  we  let  a  be  the  measure  over  0  which 
results  from  Bayesian  updating  given  the  prior  A  and  observation  of  the  private  signal. 
Signals  thus  belong  to  E,  the  space  of  probability  measures  over  (0,  J"),  and  S  is  the 
associated  sigma-algebra.   Conditional  on  the  state,  the  signals  are  assumed  to  be  i.i.d. 

Just  as  in  SS,  when  private  signals  are  not  uniformly  bounded  in  power,  learning  is  complete. 


across  individuals,  distributed  according  to  the  probability  measure  p.6  in  state  9  €  0.  To 
avoid  trivialities,  assume  that  not  all  pe  are  identical,  so  that  some  signals  are  informative. 
Each  distribution  may  contain  atoms,  but  to  ensure  that  no  signal  will  perfectly  reveal  the 
state  of  the  world,  we  insist  that  all  pe  be  mutually  absolutely  continuous  (a.c),  for  9  e  0.4 
Bayesian  Decision-Making.  Everyone  chooses  from  a  fixed  action  set  A,  equipped 
with  the  sigma-algebra  A.  Action  a  earns  a  nonstochastic  payoff  u(a,  9)  in  state  9  6  0,  the 
same  for  all  individuals,  where  u  :  A  x  ©  i->-  E  is  measurable.  It  is  common  knowledge  that 
everyone  is  rational,  i.e.  seeks  to  maximize  their  expected  payoff.  Before  deciding  upon  an 
action,  everyone  first  observes  his  private  signal/belief  and  the  entire  action  history  h. 

An  individual's  Bayes-optimal  decision  rule  uses  the  observed  action  history  and  his 
own  private  belief.  As  in  SS,  we  simply  assume  that  an  individual  can  compute  the  decision 
rules  of  all  predecessors,  and  can  use  the  common  prior  to  calculate  the  ex  ante  (time-0) 
probability  distribution  over  action  profiles  h  in  either  state.  Knowing  these  probabilities, 
Bayes'  rule  implies  a  unique  public  belief  -n  =  7r(/i)  €  E  for  any  history  h.5  A  final 
application  of  Bayes'  rule  also  given  the  private  signal  a  yields  the  posterior  belief  p  €  E. 

Given  the  posterior  belief  p  E  E,  individual  n  picks  the  action  a  £  A  which  maximizes 
his  expected  payoff  ua(p)  =  Je  u(a,9)dp(9).  We  assume  that  an  optimal  action  a  =  a(p) 
always  exists.6  The  solution  defines  an  optimal  decision  rule  x  :  E  h->  A(A),  the  space  of 
probability  measures  over  (.A,  .A).  That  is,  x  is  an  element  of  the  space  A'  of  maps  from  E 
to  A(yl).  A  rule  x  produces  an  implied  distribution  v  =  x(a)  over  actions  simultaneously 
for  all  private  beliefs  a.  The  optimal  x  clearly  depends  on  it. 

Observational  Learning  as  a  Stochastic  Process.  Since  the  probability  mea- 
sure of  signals  a  depends  on  the  state  9,  the  implied  distribution  over  actions  a  depends  on 
both  9  and  the  optimal  decision  rule  x.  In  fact,  the  density  is  ij){a\9,  x)  =  f  x(a)(a)pe(da), 
and  unconditional  on  the  state,  it  is  ip(a\n,x)  =  JQip(a\6, x)n(d9).  This  in  turn  yields 
a  distribution  over  next  period  public  beliefs  7rn+1.  Thus,  (7rn)  follows  a  discrete-time 
Markov  process  with  state-dependent  transition  probabilities.  The  next  result  is  standard: 

Lemma  1  The  belief  process  (nn)  is  a  martingale  unconditional  on  the  state,  converging 
a.s.  to  some  limiting  random  variable  Tr^,.  The  limit  n^  places  no  weight  on  point  masses 
on  the  wrong  states  of  the  world. 


4See  Rudin  (1987).  Measure  /i1  is  a.c.  w.r.t.  p?  if  p?(S)  =  0  =>  fil(S)  =  0  VS  €  S,  the  sigma-algebra  on 
E.  By  the  Radon-Nikodym  Theorem,  a  unique  g  €  L1^2)  exists  with  nl{S)  =  fsgdp2  for  every  S  €  S. 
With  p,H ,  p,L  mutually  a.c,  'almost  sure'  assertions  are  well-defined  without  specifying  the  measure. 

5If  one  wishes  to  pursue  this  angle,  this  is  the  unique  Bayesian  equilibrium  of  the  'game'. 

6Absent  a  unique  solution,  we  must  take  an  arbitrary  measurable  selection  from  the  solution  corre- 
spondence. 


Observational  Learning  Model 

Impatient  Experimenter  Model 

States  0  G  6 

States  9  g  0 

Belief  after  n  individuals  nn 

Belief  after  n  observations  7rn 

Optimal  decision  rule  x  e  X 

Optimal  action  x  G  X 

Private  signal  of  individual  n,  an 

Randomness  in  the  nth  experiment 

Action  taken  by  individual,  a  G  A 

Observable  signal  a  E  A 

Density  over  actions  ip(a\9,  x) 

Density  over  observables  ip(a\9,  x) 

Payoffs  (private  information) 

Payoffs  (unobserved) 

Table  1:  Embedding.   This  table  displays  how  our  single-type  observational  learning  model  fits  into 
the  impatient  single  person  experimentation  model. 

3.  MODEL  COMPARISON 


A  first  step  in  recasting  our  general  model  of  observational  learning  as  a  single  person 
experimentation  problem  is  to  respect  the  individuals'  selfishness.  Thus,  we  must  study 
an  impatient  experimenter  with  discount  factor  0  (no  'active  experimentation'). 

But  to  avoid  a  forgetful  experimenter,  we  must  regard  the  observational  learning  story 
from  a  new  perspective.  Consider  individual  n,  who  uses  both  the  public  belief  7rn  and  his 
private  signal  an  in  forming  and  acting  upon  his  posterior  beliefs  pn.  We  may  separate 
these  two  steps  using  the  conditional  independence  of  nn  and  on.  Mr.  n  can  be  regarded 
as:  (i)  observing  7rn,  but  not  his  private  signal;  (ii)  optimally  determining  the  rule  x  £  X, 
and  submitting  it  to  an  agent  'choice'  machine;  and  (Hi)  letting  that  machine  observe  his 
private  signal  and  take  his  action  a  G  A  for  him.  The  ultimate  payoff  u(a,  9)  is  unobserved, 
lest  that  provide  an  additional  signal  of  the  state  of  the  world.  If  private  beliefs  a  have 
distribution  //  in  state  9,  the  impatient  experimenter  will  choose  the  same  optimal  decision 
rule  x  described  in  section  2,  resulting  in  action  a  G  A  with  chance  ip(a\6,x). 

We  now  precisely  describe  the  single-person  experimentation  model.  The  state  space  is 
O.  In  period  n,  the  experimenter  EX  chooses  an  action  (the  rule)  x  G  X.  Given  this  choice, 
a  random  observable  statistic  a  G  A  is  realized  with  chance  tp(a\9,x)  in  state  6.  Finally, 
£X  updates  beliefs  using  this  information  alone.7  Table  1  summarizes  the  embedding. 

Notice  how  this  addresses  both  lead  puzzles.    First,  the  experimenter  never  knows 


7This  experimentation  model  does  not  strictly  fit  into  the  EK  mold,  where  the  instantaneous  reward 
in  period  n  depends  only  on  the  action  and  the  observed  signal,  but  (unlike  here)  not  on  the  parameter 
9  in  0.  This  is  the  structure  of  Aghion,  Bolton,  Harris,  and  Jullien  (1991)  (ABHJ),  where  payoffs  are 
not  necessarily  observed.  If  we  wish,  we  may  simply  posit  that  the  experimenter  has  fair  insurance,  and 
simply  earns  his  expected  payoff  each  period  rather  than  his  random  realized  payoff.  Then  his  behaviour 
will  be  exactly  the  same,  yet  he  will  not  learn  anything  from  the  payoff. 


the  private  beliefs  a,  and  thus  does  not  forget  them.  Second,  the  pathological  learning 
outcomes  are  entirely  consistent  with  EK's  generic  complete  learning  result  for  models 
with  finite  action  and  state  spaces.  Simply  put,  actions  do  not  map  to  actions  but  to 
signals  when  one  rewrites  the  observational  learning  model  as  an  experimentation  model. 
The  true  action  space  for  £X  is  the  infinite  space  X  of  decision  rules. 

SS  considered  two  major  modifications  of  the  informational  herding  paradigm.  One 
was  to  add  i.i.d.  noise  to  the  individual  decision  problem.  Noise  is  easily  incorporated  here 
by  adding  an  exogenous  chance  of  a  noisy  signal  (random  action).  SS  also  allowed  for  T 
different  types  of  preferences,  with  individuals  randomly  drawn  from  one  or  the  other  type 
population.  Multiple  types  can  be  addressed  here  by  simply  imagining  that  £X  chooses  a 
T-vector  of  optimal  decision  rules  from  XT  with  (only)  the  choice  machine  observing  the 
task  and  private  belief,  and  choosing  the  action  a  as  before. 


4.  PATIENCE 


4.1   Special  Assumptions 


While  we  have  rather  generally  expressed  observational  learning  as  a  type  of  myopic 
experimentation,  our  analytical  results  will  require  a  more  restricted  model  typical  of  the 
herding  paradigm.  For  instance,  if  (0,  T)  is  a  continuum  subset  R  equipped  with  the  Borel 
sigma-algebra,  then  the  space  of  mappings  X  is  rather  unwieldy:  the  space  of  measurable 
mappings  from  the  measures  E  over  0  to  the  simplex  A(A). 

So  just  as  in  SS,  we  assume  that  ©  =  {H,  L}  (or  is  more  generally  finite).  High  and 
low  states  are  equilikely  ex  ante,  and  so  have  prior  X(L)  =  X(H)  =  1/2.  Private  belief  a 
expresses  the  chance  of  state  H,  and  thus  E  is  the  interval  [0, 1],  and  S  the  Borel  sigma- 
algebra  over  [0, 1].  Let  supp(/x)  be  the  (common)  support8  of  each  probability  measure  p.6. 
If  supp(//)  C  (0, 1),  then  we  call  the  private  beliefs  bounded,  while  if  co(supp(/u))  =  [0, 1], 
they  are  unbounded.  In  that  case,  arbitrarily  strong  private  signals/beliefs  can  occur. 

Lee  (1993)  showed  that  with  these  restrictions,  a  continuous  action  space  can  easily 
allow  for  simple  statistical  learning:  Knowing  individual  n's  public  belief,  observation  of 
his  action  then  perfectly  reveals  his  private  signal.  We  thus  make  the  standard  lumpiness 
assumption  that  A  =  {a\, . . .  ,  a^}  is  finite.  We  assume  that  no  action  is  dominated  by 
the  others,  and  this  implies  the  simple  interval  structure  that  action  am  is  optimal  exactly 
when  the  posterior  p  is  in  some  sub-interval  of  [0, 1].  We  order  the  actions  such  that  am  is 
myopically  optimal  for  posteriors  p  e  [fm-i,  fm],  where  0  =  f0  <  fi  <  . . .  <  f.M  =  1- 


As  usual,  the  support  of  a  probability  on  the  Borel-algebra  is  the  smallest  closed  set  of  measure  1. 


4.2  The  Patient  Experimenter 

Since  our  experimentation  model  admits  any  discount  factor  8  G  [0, 1),  one  naturally 
wonders  what  happens  with  a  patient  experimenter  (when  8  >  0).  SS  shows  that  the 
myopic  experimenter  maps  higher  signals  into  higher  actions:  There  are  thresholds  0  = 
xq  <  Xi  <  . . .  <  xm  =  1  depending  on  it  alone,  such  that  action  am  is  strictly  optimal  when 
a  G  (xm-i,xm),  and  indifference  between  between  am  and  am+l  prevails  at  the  knife-edge 
a  =  xm.  This  is  also  true  with  patience,  as  (appendicized)  Lemma  A.l  proves.  Intuitively, 
not  only  does  the  interval  structure  maximize  immediate  payoffs,  but  it  also  ensures  the 
greatest  future  value  of  information,  by  producing  the  riskiest  posterior  belief  distribution. 

We  now  turn  to  £X's  long  run  behavior.  Lemma  1  tells  us  that  beliefs  must  settle 
down  and  that  £X  is  never  dead  wrong  about  the  state.  The  next  result  is  an  expression 
of  EK's  Theorem  5  that  the  limiting  belief  n^  precludes  further  learning. 

Proposition  1  (Absorbing  Basins)  For  each  am  G  A,  a  possibly  empty  interval  exists 
Jm{8)  C  [0, 1],  such  that  when  it  G  Jm{8),  £3C  optimally  chooses  x  which  a.s.  induces  am. 

•  For  all  5  G  [0, 1),  the  limit  belief  n^  is  concentrated  on  the  basins  Ji(8)  U  ■  •  •  U  Jm{8). 

•  With  unbounded  private  beliefs,  J\{8)  =  {0}  and  J 'M (6)  —  {1};  all  other  Jm(8)  are  empty. 

•  If  the  private  beliefs  are  bounded,  then  Ji(5)  =  [0, zl((5)]  and  Jm{8)  =  [ft(8),l],  where 
0  <  7r(5)  <  ft  (8)  <  1.  The  larger  is  8,  the  smaller  are  all  basins.  For  large  enough  8,  all 
basins  disappear  except  for  Ji  and  Jm,  while  lim^x  J\(8)  =  {0}  and  lim^!  Jm{8)  —  {!}■ 
Proof:  This  characterization  of  the  stationary  points  of  the  stochastic  process  of  beliefs 
(7rn)  directly  generalizes  the  analysis  for  8  =  0  in  SS.  See  figure  1  for  an  illustration  of  how 
the  basins  are  determined  from  the  shape  of  the  optimal  value  function.  All  but  the  initial 
limit  belief  result  are  established  in  the  appendix.  To  see  why  that  one  is  true  —  that  a 
limit  cascade  must  occur,  as  SS  call  it  —  observe  that  for  any  belief  7r  not  in  any  basin, 
at  least  two  signals  in  A  are  realized  with  positive  probability.  By  the  interval  structure, 
the  highest  such  signal  is  more  likely  in  state  H,  and  the  lowest  more  likely  in  state  L.  It 
follows  that  next  period's  belief  differs  from  #  with  positive  probability.  Intuitively,  or  by 
the  characterization  result  for  Markov-martingale  processes  in  appendix  B  of  SS,  n  cannot 
lie  in  the  support  of  71^.  □ 

Proposition  2  (Convergence  of  Beliefs,  or  Long  Run  Learning) 

•  For  unbounded  private  beliefs,  tt^  is  concentrated  on  the  truth  for  any  8  G  [0, 1). 

•  With  bounded  private  beliefs,  learning  is  incomplete  for  any  8  G  [0, 1):  UnlessiiQ  G  Jm{8), 
there  is  positive  probability  in  state  H  that  -k^  is  not  in  Jm{8)- 

•  The  chance  of  incomplete  learning  with  bounded  private  beliefs  vanishes  as  8  1 1  • 


Figure  1:  Typical  value  function.   Stylized  graph  of  v{ir,S),  6  >  0,  with  three  actions. 
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Proof:  Given  the  absorbing  basin  characterization  of  Proposition  1,  the  unbounded  beliefs 
result  follows  from  Lemma  1  (since  tTqo  places  no  weight  on  the  point  belief  7r  =  0).  The 
bounded  beliefs  incomplete  learning  conclusion  follows  just  as  in  Theorem  1  of  SS.  We  now 
extend  that  proof  to  establish  the  limiting  result  for  5  1 1.  First,  Proposition  1  assures  us 
that  for  6  close  enough  to  1,  -k^  places  all  weight  in  J\{5)  and  Jm(S).  The  likelihood  ratio 
£n  =  (1  —  7rn)/7rn  is  a  martingale  conditional  on  state  H.  Because  all  private  beliefs  a  have 
likelihood  ratio  (1  —  o)/a  bounded  above  by  some  I  <  oo,  the  sequence  (£n)  is  bounded 
above  by  i(l  —  ]i(6))/n(S),  and  the  mean  of  £<*>  must  equal  its  prior  mean  (1  —  7ro)/7r0.  Since 
lim^i  n(5)  =  0,  the  weight  that  tt^  places  on  J\(S)  in  state  H  must  vanish  as  6  — >  1.    □ 

Observe  how  incomplete  learning  to  some  extent  plagues  even  an  extremely  patient 
£X.  So  this  problem  does  not  fall  under  the  rubric  of  EK's  Theorem  9,  where  it  is  shown 
that  if  the  optimal  value  function  v  is  strictly  convex  in  beliefs  ir,  learning  is  complete  for 
6  close  enough  to  1.  For  here,  EX  optimally  behaves  myopically  for  very  extreme  beliefs: 
v(ir)  =  uai(ir)  for  it  near  0,  and  v(ir)  =  uQM(7r)  for  n  near  1,  both  affine  functions.  This 
points  to  the  source  of  the  incomplete  learning:  lumpy  signals  rather  than  impatience. 

4.3  Forward-Looking  Informational  Herding 

Let  us  return  to  the  view  of  this  as  the  informational  herding  paradigm.  Suppose  first 
that  everyone  is  altruistic,  but  subject  to  the  standard  herding  informational  constraints. 
If  the  realized  payoff  sequence  is  (uk),  suppose  that  every  individual  n  =  1,2,...  seeks  to 
maximize  E[(l  -  5)  E*Ln  **""«* M>  where  we  define  E[f\ir]  =  £»ee  *(*)/(*)■  Then  the 
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solution  to  £X's  problem  yields  a  perfect  Bayesian  equilibrium  of  this  model. 

The  Planner's  Problem.  Now  let  us  add  a  shade  of  economic  realism,  and  posit 
selfish  behavior.  Reinterpret  the  patient  experimenter's  problem  as  that  of  an  information- 
ally  constrained  social  planner  ST  trying  to  maximize  (1  -  S)  Yl^Li  ^n~lun,  the  expected 
discounted  average  welfare  of  the  individuals  in  the  herding  model.  Observe  how  this 
perfectly  aligns  the  ST's  and  £X's  objectives.  To  respect  that  key  herding  informational 
assumption  that  actions  are  observed  and  signals  unobserved,  assume  further  that  the 
ST  neither  knows  the  state  nor  can  observe  the  individuals'  private  signals,  but  can  both 
observe  and  punish/reward  any  actions  taken. 

We  are  now  positioned  to  reformulate  the  learning  results  of  the  last  section  at  the  level 
of  actions.  The  Overturning  Principle  of  SS  generalizes  to  this  case:  Lemma  A. 2  establishes 
that  for  7r  near  Jm(5),  actions  other  than  am  will  push  the  updated  public  belief  far  from 
its  current  value.  This  at  once  yields  the  following  corollary  to  Proposition  2. 

Proposition  3  (Convergence  of  Actions,  or  Herds) 

•  For  unbounded  private  beliefs,  a  herd  eventually  starts  on  the  correct  action. 

•  With  bounded  private  beliefs,  a  herd  on  some  action  eventually  starts.  Unless  7r0  G  Jm{8), 
a  herd  arises  on  an  action  other  than  om  with  positive  chance  in  state  H  for  any  5  €  [0, 1). 

•  The  chance  of  an  incorrect  herd  with  bounded  private  beliefs  vanishes  as  S  t  1. 

It  is  no  surprise  that  ST  ends  up  with  full  learning  with  unbounded  beliefs,  for  even  selfish 
individuals  will.  More  interesting  is  that  ST  optimally  incurs  the  risk  of  an  ever-lasting 
incorrect  herd.  Herding  is  truly  a  robust  property  of  the  observational  learning  paradigm. 

Optimal  Transfers.  How  does  ST  steer  the  choices  away  from  the  myopic  solution 
to  £X's  problem?  Let  ST  tax  or  subsidize  the  actions  according  to  the  following  simple 
scheme.  Given  the  current  public  belief  n,  if  the  individual  takes  action  a  he  receives  the 
(possibly  negative)  transfer  T(a\ir).  Faced  with  such  incentives,  our  proof  in  Lemma  A.l 
that  individuals  optimally  choose  private  belief  threshold  rules  is  still  valid. 

Given  public  belief  n,  and  optimal  rule  x*  for  ST,  how  are  the  transfers  set?  The  private 
belief  a  is  mapped  into  the  posterior  p(n,a)  =  7ra/[ira  +  (1  -  7r)(l  -  a)}.  The  selfish 
herder's  threshold  xm  must  solve  the  indifference  equation  uam(p(7r,xm))  +  r(am|7r)  = 
"am+i(p(7r,  xm))  +  r(am+1|7r).  So,  the  difference  r(am_x  |7r)  —  r(am |7r)  alone  matters  for  how 
individuals  trade-off  between  the  two  actions,  and  the  ST  can  ensure  that  the  threshold 
belief  is  optimally  chosen,  xm  =  x*m,  by  suitably  adjusting  this  difference. 

As  incentives  are  unchanged  if  a  constant  is  added  to  all  M  transfers,  we  can  also  insist 
that  ST  achieve  expected  budget  balance  each  period:  i.e.  the  expected  contribution  from 
everyone  is  zero,  or  0  =  ]Cm=i  ^(am|7r,s)r(am|7r).  This  uniquely  determines  the  transfers. 


We  would  dearly  like  to  provide  a  crisp  characterization  of  these  transfers  —  but  after 
much  work,  we  believe  that  this  is  impossible.  Clearly,  S7  will  not  tax  or  subsidize  actions 
if  his  desired  one  will  be  chosen  anyway,  i.e.  for  7r  6  Ji(S)  C  J{.  Conversely,  if  SCP  wishes 
that  actions  be  chosen  with  discretion  as  when  it  £  Ji{6),  then  some  transfers  intuitively 
will  differ  from  zero,  because  myopic  and  patient  behavior  do  not  coincide.  Rigorously,  as 
7r  e  Ji{8)  C  Ji  is  a  strict  inclusion  for  high  enough  discount  factors  by  Claim  A. 7,  transfers 
are  not  identically  zero  for  a  sufficiently  patient  ST.  Beyond  that,  we  can  only  say  that 
'experimentation'  (making  non-myopic  choices)  is  rewarded,  as  it  provides  information  to 
successors.  What  these  'high-information'  choices  are  is  rather  hard  to  tell. 

5.  CONCLUSION 

This  paper  has  discovered  and  explored  the  fact  that  informational  herding  is  simply 
incomplete  learning  by  a  single  experimenter,  suitably  concealed.  Yet  herding  has  clearly 
achieved  a  measure  of  'market  penetration'  that  has  eluded  the  experimentation  literature. 
Aside  from  its  simple  economics,  we  believe  this  owes  to  its  focus  on  the  zero  discounting 
case,  thus  avoiding  the  heavy  dynamic  optimization  machinery. 

Our  mapping,  recasting  everything  in  rule  space,  has  led  us  to  an  equivalent  social 
planner's  problem,  an  oft-performed  exercise  for  economic  models.  The  revelation  principle 
in  mechanism  design  also  employs  a  'rule  machine'.  But  in  multi-period,  multi-person 
models  with  uncertainty,  the  planner  must  respect  the  agents'  belief  filtrations.  Our  rule 
machine  approach  succeeds  by  exploiting  the  martingale  property  of  public  beliefs. 

This  property  extends  beyond  action  observation  models  to  those  where  only  an  im- 
perfect signal  of  one's  posterior  belief  is  observed  —  again,  provided  the  entire  history  of 
such  signals  is  observed.  But  once  we  relax  the  assumption  that  the  history  is  perfectly 
observed,  as  in  Smith  and  S0rensen  (1996b),  the  public  belief  process  (howsoever  defined) 
ceases  to  be  a  martingale  —  a  fact  that  we  are  exploring  in  a  work  in  progress.  As  a  result, 
a  mapping  of  these  models  into  the  experimentation  literature  no  longer  appears  possible. 

Finally,  we  and  others  have  correctly  attributed  bad  herds  to  the  lumpy  transmission  of 
social  information  (a  finite  action  space).  As  in  EK,  incomplete  learning  at  the  very  least 
requires  an  optimal  action  x  for  which  unfocused  beliefs  are  invariant,  i.e.  the  distribution 
ip(a\9,  x)  of  signals  a  is  the  same  for  all  states.  Such  an  invariance  is  clearly  easier  to  satisfy 
with  fewer  available  signals,  and  not  surprisingly  herding  and  all  published  experimentation 
pathologies  that  we  have  seen  assume  a  finite  (vs.  continuous)  signal  space.  For  instance, 
Rothschild  (1974),  McLennan  (1984),  and  ABHJ's  example  are  all  binary  signal  models. 
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A.   APPENDIX 


A.l  Preliminary 


A  strategy  sn  for  period  n  is  a  map  from  E  to  X.  It  prescribes  the  rule  xn  €  X  which 
must  be  used,  given  belief  7rn.  The  experimenter  chooses  a  strategy  profile  s  =  {su  s2, . . .), 
which  in  turn  determines  the  stochastic  evolution  of  the  model  —  i.e.  a  distribution  over 
the  sequences  of  realized  actions,  signals,  payoffs,  and  beliefs. 

Our  analysis  is  inspired  by  ABHJ  and  sections  9.1-2  of  Stokey  and  Lucas  (1989).  The 
value  function  v(-,S)  :  E  ■->•  E  for  the  experimentation  problem  with  discount  factor  5 
is  u(7r,  5)  =  sups  E[(l  -  5)  Y^Li  <St-lunk],  where  the  expectation  uses  the  distribution  of 
processes  implied  by  s.  Recall  that  um(ir)  =  7ra(am,i7)  +  (1  -  ir)u(am,L)  denotes  the 
expected  payoff  from  am  at  belief  n.  Since  um  is  affine,  we  have  the  Bellman  operator  Tg 

Tsv(ir)  =  sup  <   y^  ip{am\TT,  x)  [(1  -  6)um(q{7r,  x,  am))  +  6v(q{n,  x,  am))]  V  (A-l) 

xex  UmeA  J 

where  q(n,  x,  a)  is  the  Bayes-updated  belief  from  tt  when  a  is  observed  and  rule  x  is  applied. 
Note  that  for  v  >  v'  we  have  T$v  >  Tgv'.  As  is  standard,  T$  is  a  contraction,  and  v(-,  S) 
is  its  unique  fixed  point  in  the  space  of  bounded,  continuous,  weakly  convex  functions. 

Lemma  A.l  (Interval  Structure)  For  any  n,  one  optimal  rule  x  £  X  is  described  by 
thresholds  0  =  Xq  <  X\  <  . . .  <  xm  =  1  such  that  action  am  is  taken  when  a  €  (^m_i,  xm), 
and  when  a  —  xm  there  is  randomization  between  am  and  am+i. 

Proof:  We  prove  that  any  rule  x  violating  this  interval  structure  can  be  strictly  improved 
upon.  For  mi  <  m-i,  let  £j  (i  =  1, 2)  be  those  signals  in  E  which  are  mapped  with  positive 
probability  into  amr  Assume  to  the  contrary  that  the  sets  are  not  ordered  as  Ei  <  E2. 

If  q(Tr,x,ami)  >  q(ir,x,am2),  then  given  our  ordering  of  actions,  payoffs  are  improved 
(with  no  loss  of  information)  by  remapping  signals  leading  to  ami  into  am2,  and  vice  versa. 

Next,  assume  that  g(7r,a;,ami)  <  g(7r,x,am2).  For  any  x  E  (0,1),  define  T,i(x)  = 
(Ex  U  E2)  n  [0,x]  and  E2(x)  =  (E:  U  E2)  D  [x,  1].  Consider  then  the  modified  rule  x  which 
equals  x,  except  that  ami  is  taken  for  signals  in  Ej(x),  and  where  x  satisfies  ^(ami|7r,x)  = 
il)(ami\ir,x)  (it  may  be  necessary  for  x  to  randomize  over  the  two  actions  at  signal  x  to 
accomplish  that).  Since  signals  more  in  favor  of  state  L  are  mapped  into  ami  under  x,  we 
find  q(n,x,ami)  <  q(n,x,ami),  and  similarly  q(Tr,x,am2)  >  q(7r,x,am2).  Thus,  x  implies 
a  mean  preserving  spread  of  the  updated  belief  versus  x,  and  since  the  value  function  is 
weakly  convex,  this  weakly  improves  the  value  in  the  Bellman  equation  Ts{v)  =  v.  D 
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A. 2  Proof  of  Proposition  1 

The  proposition  is  established  in  a  series  of  claims. 

Claim  A.l  (Basins)  For  each  am  G  A,  a  possibly  empty  interval  Jm{8)  exists,  such  that 
when  7r  €  Jm{fi),  £■%  optimally  chooses  x  with  supp(/i)  C  [xm-i,xm],  i.e.  am  occurs  a.s. 
(learning  stops).  For  any  8  G  [0, 1),  0  G  Ji{8)  and  1  G  Jm{8). 

Proof:  For  the  first  half,  we  really  need  only  prove  that  Jm{8)  must  be  an  interval.  If 
7r  G  Jm(5),  then  am  is  the  optimal  choice,  and  the  value  is  v(n,8)  =  v0(ir)  =  um(Tr). 
Conversely,  if  v{tt,  8)  =  vq(-k)  =  um(7r)  then  n  G  Jm(8)  and  am  is  the  optimal  choice.  As 
um(ir)  is  an  affine  function  of  ir,  and  v(-,8)  is  weakly  convex,  Jm(8)  must  be  an  interval. 

For  the  second  half,  ai  is  myopically  strictly  optimal  for  the  focused  belief  7rn  =  0,  and 
since  it  updates  to  7rn+i  =  tt  a.s.  no  matter  which  rule  is  applied,  it  is  also  dynamically 
optimal  for  any  discount  factor  5  G  [0, 1).  A  similar  argument  holds  when  7rn  =  1.  □ 

Define  the  expected  utility  frontier  function  v0  by  v0(7r)  =  maxmum(7r),  that  is,  the 
expected  utility  an  individual  would  obtain  by  choosing  the  myopically  optimal  action.9 

Claim  A. 2  (Iterates)  The  sequence  {T^v0}  consists  of  weakly  convex  functions  that  are 
pointwise  increasing,  and  converge  to  v(-,5).   The  value  v(-,5)  weakly  exceeds  vq. 

Proof:  To  maximize  Z)ame-4^'(aml7r':r)  K1  ~  fi)um{q(n,x,am))  +  5vQ(q(7r,x,am))]  over  x 
for  given  tt,  one  possible  policy  x  ensures  that  the  myopically  optimal  signal  am  occurs 
with  probability  one.  Then  q(-K,  x,  x(a))  =  tt  a.s.,  resulting  in  value  v0(n).  Optimizing  over 
all  x  G  X,  we  see  that  Tsv0(tt)  >  v0(ir)  for  all  it.  By  induction,  Tnv0  >  Tn_1u0,  yielding 
(as  per  usual)  a  pointwise  increasing  sequence  converging  to  the  fixed  point  v(-,5)  >  v0.D 

The  following  either  is  or  ought  to  be  a  folk  result  for  optimal  experimentation,  but  we 
have  not  found  a  published  or  cited  proof  of  it.10 

Claim  A. 3  (Monotonicity)  The  value  function  weakly  rises  when  8  increases:  Namely, 
for  8i  >  82,  v(tt,  8i)  >  v(tt,  82)  for  all  n. 

Proof:  Clearly,  Y,am€A  1>(°m\*,x)iiTn{q(*,x,am))  <  E^e^M^M^'^'O)  for 
any  x  and  any  function  v  >  v0.  If  81  >  82,  then  TSlv0  >  Tg2v0,  since  there  is  more  weight 
on  the  larger  component  of  (A-l).  Inductively,  we  have  T^v0  >  T£2v0,  since  one  possible 
policy  under  81  is  to  choose  the  x  optimal  under  82.  Let  n  — >  00  and  apply  Claim  A. 2.  □ 

9Observe  how  this  differs  from  v{n,0)  =  sup;,  X)m^(am|7r,x)um(9(7r,a,am)).  In  other  words,  v(ir,0) 
allows  the  myopic  individual  to  observe  one  signal  a  before  obtaining  the  ex  post  value  v0(p(-n,a)). 
10But  ABHJ  do  assert  without  proof  (p.  625)  that  the  patient  value  function  exceeds  the  myopic  one. 
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Claim  A. 4  (Inclusion)  All  basins  weakly  shrink  when  6  increases:  In  other  words, 
Vam  £  A,  if  I  >  6i,82  >  0,  then  Jm(6i)  C  Jm{62). 

Proof:  As  seen  in  Claims  A. 3  and  A. 2,  v{x, 5{)  >  v{n,S2)  >  v0(n)  >  um{n)  for  all  n, 
when  61  >  62.  For  it  €  Jm(6i),  we  know  v(n,5i)  =  um(n)  and  thus  v(n,62)  =  um(n).  The 
optimal  value  can  thus  be  obtained  by  inducing  am  a.s.,  so  that  7r  G  Jm(^2)-  □ 

Claim  A. 5  (Unbounded  Beliefs)  With  unbounded  private  beliefs,  only  basins  for  the 
extreme  actions  are  empty,  with  Ji{5)  =  {0}  and  Jm{8)  =  {1};  all  other  Jm(6)  are  empty. 
Proof:  SS  establish  for  the  myopic  model  that  all  Jm(0)  are  empty,  except  for  Ji  (0)  =  {0} 
and  Jm{0)  =  {!}■  Now  apply  Claims  A.l  and  A.4.  □ 

Claim  A. 6  (Bounded  Beliefs)  If  the  private  beliefs  are  bounded,  then  Ji(S)  =  [0,7r(<5)] 
and  Jm{8)  -  [*(£),  1],  where  0  <  n(6)  <  it (6)  <  1. 

Proof:  We  prove  that  for  sufficiently  low  beliefs  it  is  optimal  to  choose  a  rule  x  such  that 
ai  occurs  with  probability  one;  the  argument  for  large  beliefs  is  very  similar.  Since  action 
ai  is  optimal  at  belief  ir  =  0,  and  is  not  weakly  dominated,  there  must  be  some  positive 
length  interval  [0,fr]  on  which  Ui(ir)  =  t>o(7r),  i.e.  ai  is  the  optimal  choice  for  beliefs  tt  <  it. 
Moreover,  since  each  um  is  affine,  3u  >  0  such  that  Ui(ir)  >  um(ir)  +  u  for  all  m  ^  1,  and 
for  all  beliefs  7r  in  the  interval  [0, 7r/2]. 

No  observation  a  £  A  can  ever  yield  a  stronger  signal  than  any  a  €  supp(/x)  C  [a,  a]  C 
(0, 1).  So  any  initial  belief  n  is  updated  to  at  most  q(ir)  =  ira/[ira  +  (1  —  7r)(l  —  a)].  For 
7r  sufficiently  small,  q(n)  G  [0, 7r/2]  and  q(7r)  —  it  is  arbitrarily  small.  By  the  continuity 
of  v,  v(q(ir),  5)  —  v(ir,  6)  is  then  arbitrarily  small  —  in  particular,  less  than  u{\  —  5)/8  for 
small  enough  7r.  The  Bellman  equation  Tg(v)  =  v  corresponding  to  (A-l)  reveals  that  it  is 
strictly  suboptimal  to  risk  any  outcome  other  than  ai  for  such  small  beliefs.  D 

Claim  A. 7  (Limiting  Patience)  For  large  enough  8,  all  basins  disappear  except  for 
Ji{8)  and  Jm(8),  while  lim^x  Ji(5)  =  {0}  and  lima-n  Jm(8)  =  {1}- 

Proof:  Fix  5  €  [0, 1),  and  an  action  index  m  (1  <  m  <  M)  for  which  Jm{6)  =  [7rx,  7T2], 
for  some  0  <  7Ti  <  tt2  <  1.  Since  there  are  informative  private  beliefs,  3x*  €  (1/2, 1)  with 
1  >  fiH ([x*,  1])  >  nL([x*,  1])  >  0.  We  shall  consider  the  alternative  policy  x,  with  interval 
boundaries  xm_!  =  0,  xm  =  2*,  xm+i  =  1  (see  Lemma  A.l). 

Updating  7r  with  observation  of  event  {a  €  [x* ,  1]}  yields  the  posterior  belief  q(n)  = 
7r/iH([x*,l])/[7r^([a;*,l])  +  (1  -  tt)^l([x*,  1])]  in  state  H.  For  any  closed  subinterval 
/  C  (0, 1),  in  particular  one  with  /  D  Jm(6),  there  exists  e  =  e(I)  >  0  with  q(n)  -  -n  >  e 
for  all  7r  G  /.  By  definition  of  e,  q  maps  the  interval  [tt2  -  e/2,  tt2]  into  (but  not  necessarily 
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onto)  [n2  +  e/2, 1].  Choose  u  >  0  so  large  that  um(ir)  <  um+1(7r)  +  u  for  all  7r  €  [0, 1]. 
Since  u(7r,  5)  >  um(ir)  outside  Jm(5)  =  [7Ti,  7r2],  and  both  are  continuous  in  it,  we  may  also 
choose  u  >  0  so  small  that  v(tt,  5)  >  um(n)  +  u  for  all  n  €  [7r2  4-  e/2, 1].  By  Claim  A. 3,  we 
thus  have  v(tt,  6')  >  um(n)  +  u  for  all  6'  >  8.  If  5'  >  6  is  so  large  that  (1  -  5')u  <  S'u,  then 
the  Bellman  equation  Tg(v)  =  v  reveals  that  our  suggested  policy  x  beats  inducing  am  a.s. 
when  it  E  [7r2  —  e/2,n2].  By  iterating  this  procedure  a  finite  number  of  times,  each  time 
excising  length  e/2  from  interval  Jm(5),  we  see  that  Jm(S)  evaporates  for  large  enough  6. 
If  m  =  1  or  m  =  M,  this  procedure  can  still  be  applied  repeatedly,  to  show  that 
Jm(5)  n  /  vanishes  for  large  enough  5  for  any  closed  /  C  (0, 1).  □ 

A. 3  Proof  of  Proposition  3 

In  light  of  the  interval  structure  found  in  Lemma  A.l,  ST's  problem  is  simply  to 
determine  the  chances  ?/>(am|7r)  with  which  each  action  should  be  chosen  (or  equivalently, 
what  fraction  of  the  signal  space  maps  into  each  action).  Thus,  the  choice  set  is  WLOG  the 
compact  M-simplex  A(A)  (that  is,  the  same  strategy  space  as  in  the  original  observational 
learning  model).  The  objective  function  in  the  Bellman  equation  corresponding  to  (A-l)  is 
continuous  in  this  choice  vector  and  in  it,  and  it  follows  from  the  Theorem  of  the  Maximum 
(e.g.  Theorem  LB. 3  of  Hildenbrand  (1974))  that  the  non-empty  correspondence  of  optimal 
rules  is  upper  hemi-continuous  in  7r.n 

Lemma  A. 2  (Overturning  Principle)  For  all  8  €  [0, 1)  and  am  G  A  with  Jm(5)  ^  0, 
there  exists  e  >  0  and  an  open  interval  K  D  Jm(5),  such  that  Vtt  £  K  and  Va  ^  amj 
\\q(ir,x,a)  —  7r||  >  e  when  x  is  the  optimal  rule. 

Proof:  For  it  €  Jm{S)  we  know  that  the  only  optimal  rule  is  to  a.s.  induce  am.  As  the 
correspondence  of  optimal  rules  is  u.h.c.  in  it,  the  optimal  rule  for  it  near  Jm(5)  induces 
actions  other  than  am  for  only  the  most  extreme  portion  of  supp(/u).  With  bounded  beliefs, 
it  is  bounded  away  from  0  and  1,  and  so  it  is  immediate  that  the  public  belief  must  jump 
at  least  some  e  >  0  upon  observing  any  action  other  than  am. 

With  unbounded  beliefs,  an  extra  step  is  needed.  Assume  WLOG  that  it  is  near  0.  Then 
q{ir,  x,ai)  <  it,  i.e.  the  updated  belief  is  close  to  the  current,  so  that  v(q{ir,  x,  a{),8)—v{iT,  5) 
is  near  zero.  Choosing  an  action  other  than  a:  implies  a  boundedly  positive  immediate 
loss,  which  must  be  compensated  by  gained  future  value  (again,  by  the  Bellman  equation). 
Since  v  is  continuous,  it  follows  that  q(ir,  x,  am)  must  be  boundedly  higher  than  it.         D 


11  The  optimal  rule  is  a  point- valued  function  except  when  there  is  an  atom  on  the  interval  boundary. 

14 


MIT  LIBRARIES 

Illlllllllllllllilll 

I  'ill      I    II     i 


3  9080  00988  2751 


References 


Aghion,  P.,  P.  Bolton,  C.  Harris,  and  B.  Jullien  (1991):  "Optimal  Learning  by 
Experimentation,"  Review  of  Economic  Studies,  58(196),  621-654. 

BANERJEE,  A.  V.  (1992):  "A  Simple  Model  of  Herd  Behavior,"  Quarterly  Journal  of 
Economics,  107,  797-817. 

BlKHCHANDANl,  S.,  D.  HlRSHLElFER,  and  I.  Welch  (1992):  "A  Theory  of  Fads,  Fash- 
ion, Custom,  and  Cultural  Change  as  Information  Cascades,"  Journal  of  Political  Econ- 
omy, 100,  992-1026. 

EASLEY,  D.,  and  N.  KlEFER  (1988):  "Controlling  a  Stochastic  Process  with  Unknown 
Parameters,"  Econometrica,  56,  1045-1064. 

Hildenbrand,  W.  (1974):  Core  and  Equilibria  of  a  Large  Economy.  Princeton  University 
Press,  Princeton. 

Lee,  I.  H.  (1993):  "On  the  Convergence  of  Informational  Cascades,"  Journal  of  Economic 
Theory,  61,  395-411. 

McLennan,  A.  (1984):  "Price  Dispersion  and  Incomplete  Learning  in  the  Long  Run," 
Journal  of  Economic  Dynamics  and  Control,  7,  331-347. 

Rothschild,  M.  (1974):  "A  Two-Armed  Bandit  Theory  of  Market  Pricing,"  Journal  of 
Economic  Theory,  9,  185-202. 

Rudin,  W.  (1987):  Real  and  Complex  Analysis.  McGraw-Hill,  New  York,  3rd  edn. 

SMITH,  L.,  and  P.  S0RENSEN  (1996a):  "Pathological  Outcomes  of  Observational  Learn- 
ing," MIT  Working  Paper. 

(1996b):  "Rational  Social  Learning  Through  Random  Sampling,"  MIT  mimeo. 


Stokey,  N.  L.,  and  R.  E.  LUCAS  (1989):   Recursive  Methods  in  Economic  Dynamics. 
Harvard  University  Press,  Cambridge,  Mass. 


15 


